A global descriptor of spatial pattern interaction in the galaxy distribution 

Martin Kerscher 1,7 , Maria Jesus Pons-Borderia 2 , Jens Schmalzing 1,3 , Roberto 
Trasarti-Battistoni 1 ' 4 , Thomas Buchert 1 , Vicent J. Martinez 5 , and Riccardo Valdarnini 6 

ABSTRACT 

bo" 
on. 

Q^ , We present the function J as a morphological descriptor for point patterns formed 

by the distribution of galaxies in the Universe. This function was recently introduced 
in the field of spatial statistics, and is based on the nearest neighbor distribution and 
the void probability function. The J descriptor allows to distinguish clustered (i.e. 
correlated) from "regular" (i.e. anti-correlated) point distributions. We outline the 
theoretical foundations of the method, perform tests with a Matern cluster process as 

{Sj . an idealised model of galaxy clustering, and apply the descriptor to galaxies and loose 

groups in the Perseus-Pisces Survey. A comparison with mock-samples extracted from 
a mixed dark matter simulation shows that the J descriptor can be profitably used to 
constrain (in this case reject) viable models of cosmic structure formation. 



Subject headings: methods: statistical; galaxies: clusters: general; large-scale structure 
of universe 



ON 

6 

52 ' 1- Introduction 

Three-dimensional patterns formed by the spatial distribution of galaxies in the Universe have 
already been described and quantified by various methods: correlation functions, counts-in-cells 
(Peebles 19931 ), the void probability function ( [White 1979Q , the genus QMelott 1990J ), the multi- 



fractal spectrum ( Martinez et al. 1990| ), skewness and kurtosis ( paztahaga fc Fricman 1994 ), and 
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Minkowski functionals (Mecke et al. 1994, [Schmalzing fc Buchert 1997 ). Some of these descrip- 



tors are complementary and suggest a physical interpretation of cosmic patterns by emphasising 
different spatial features of the galaxy distribution. 

The treatment of the galaxy distribution as a realization of a spatial point process promises 
useful insights through the application of methods from the field of spatial statistics. The forth- 
coming three-dimensional galaxy catalogues with more than half a million redshifts additionally 
motivate the development of new statistical techniques. 

In this article we want to reinforce a morphological measure for the study of the distribution 
of galaxies, the J(r)-function, which has recently been introduced into the field of spatial statistics 



by |van Lieshout fc Baddeley (1996)| and is related to the nearest-neighbor distribution G{r) and 



the spherical contact distribution F(r). Indeed, the J(r)-function is equal to the first conditional 
correlation function ( Stratonovich 1963| , White 1979| ), and was used by [Sharp (1981) to test a 



hierarchical ansatz for n-point correlation functions. We will focus on different features of the 
J(r)-function showing its discriminative power as a measure of the strength of clustering. 

Our article is organised as follows: In Sect. [2] we present the distribution functions F and G 
and show how the J function is constructed. A Matern cluster process is considered as a simple 
example of a clustering point distribution. In Sect. || we study the clustering properties of a galaxy 
sample and of galaxies in groups extracted from the Perseus-Pisces redshift survey (PPS). We 
compare the observed galaxy distribution with mock samples extracted from a Mixed Dark Matter 
(MDM) simulation in Sect. ||. We summarise and conclude in Sect. [|. 



2. The J function 

In the theory of spatial point processes the distribution of a point's distance to its nearest 



neighbor is a common tool for the analysis of point patterns ( Stoyan et al. 1995 ). We consider the 
redshift space coordinates {xj}^L 1 € IR of iV galaxies inside a region D C IR as a realization of 
the point process describing the spatial distribution of galaxies in the Universe. 

The nearest neighbor distribution G(r) is the distribution function of the distance r of a point 
of the process to the nearest other point of the process. Similarly, the spherical contact distribution 
F(r) is the distribution function of the distance r of an arbitrary point in IR to the nearest point 
of the process. F(r) is equal to the volume fraction occupied by the set of all points in D which 
are closer than r to a point of the process. Hence, F{r) coincides with the volume density of the 



first Minkowski functional (Mecke et al. 1994 and Kerscher et al. 1997) and is related to the void 



probability function Po(r) via F(r) = 1 — Po(r). 
For a homogeneous Poisson process we have 

4tt 



F P (r) = l-exp(-yr 3 n) =G P (r), (1) 



where n is the number density. Boundary-corrected estimators for both the nearest neighbor dis- 
tribution and the spherical contact distribution used in our studies are provided by minus (reduced 
sample) estimators ( [Stoyan et al. 1995 , also detailed in Kerscher et al. 1998J ) . 



In a recent paper, van Lieshout fc Baddeley (1996)| have suggested to use the quotient 



for characterising a point process; in that way the surroundings of a point belonging to the process 
and the neighborhood of a random point are compared. They consider several point process models 
and provide limits and exact results on J(r) (see also Section |2.l| ). 

If the process under consideration is clustered, an arbitrary point usually lies farther away 
from a point of the process than in the case of a Poisson process. Hence, clustering is indicated by 
F(r) < Fp(r). Consistently, G(r) > Gp(r), since clustered points tend to lie closer to their nearest 
neighbors than randomly distributed points. So, for a clustered point distribution, J(r) < 1. 

In case of anti-correlated, "regular" structures the situation is the opposite: on average a point 
of a regular process is farther away from the nearest other point of the process, so G(r) < Gp(r), 
and a random point is closer to a point of the process, resulting in F{r) > Fp(r). Therefore, regular 
structures are indicated by J(r) > 1. 

For a homogeneous Poisson process we obtain Jp(r) = 1, separating regular from clustering 
structures. 



2.1. The Matern cluster process 

Before attempting to apply J(r) to galaxy samples, we want to test it on a model with non- 
trivial yet analytically tractable behaviour of J{r). 



In order to describe the clustering of galaxies, Neyman & Scott (1958) suggested a class of 



1.5 ft; Mpc 



Fig. 1. — Sketch of a two-dimensional Matern cluster process with cluster radius R = 1.5/i 1 Mpc 
and mean number of clusters ^ = 5. 



point processes that was subsequently named after them. We concentrate on a subclass called 
Matern cluster processes. They are constructed by first distributing uniformly M cluster centres. 
Around each cluster centre, which is itself not included in the final point distribution, m galaxies 
are placed randomly within a sphere of radius R, where m is a Poisson distributed random variable 
with mean //. In Figure [I] we show a sketch of such a process. Note that overlapping clusters are 
allowed. 



For a Matern cluster process, van Lieshout fc Baddeley (1996)| proved that J(r) is monotoni- 
cally decreasing from 1 at r = and attains a constant value for r > 2R, where R is the radius of a 
cluster. This constant value can be interpreted as a relic of the uniform distribution of the cluster 
centres. In three dimensions 

1 __ f„ fi -^(x,r,R) r] 3 7 , fnr ft<r<2R 

(3) 



Ju(r) 



Vol(B K ) JB R 

exp(-/i) 



for 0<r<2R, 
for r > 2R, 



where 



V(x,r,R) 



Vol(5 r (x)nS fl ) 



(4) 



Vol(S fl ) 

denotes the ratio of the volume of the intersection of two balls to the volume of a single ball. Here 
B r hc) is a ball of radius r centred at the point x, while Br is a ball of radius R centred at the 



origin. This quantity can be calculated from basic geometric considerations, both in two ( Stoyan 



& Stoyan 1994) and in three dimensions, where the result is 



V(x,r,R) 



C3X 3 + c\x + cq + C-\x l for 0<r<R and R 
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r<x<R, 
or R<r<2R and r — R<x<r, 
for 0<r<R and 0<x<R— r, 
for R<r<2R and 0<x<r-ii. 
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In Figure Q we show| Jm(^) for R = 1.5/i _1 Mpc and several values of /x; this represents typical 
situations of galaxy clustering. Obviously J{r) discriminates between the varying richness classes 
of the Matern cluster processes. 



3. Galaxy samples 

In this section we want to go one step further by applying J(r) to catalogues of galaxies and 
groups of galaxies, and compare them with a Matern cluster process. 



8 Throughout this article, distances are given in h 1 Mpc, where h denotes the value of the Hubble parameter 



measured in units of 100 ', , 

Mpc 



3.1. Description of the PPS galaxy and group samples 

The PPS database was compiled in the last decade (Giovanelli & Haynes 1991, Wegner et al 



1993). The full redshift survey is magnitude-limited down to a Zwicky magnitude of ttt-z < 15.7 



( Zwicky et al. 1968 ), and at least 95% complete to mz < 15.5 (see Figure 1 in lovino et al 
1993 ). We extract a volume-limited subsample with Mi < —19 and radius 79/i _1 Mpc, confined to 
— l^.SO < a < +3^.00 and 0° < 5 < 40°, i.e. a solid angle of 0.76sr. Redshifts are corrected for the 
motion of the Sun relative to the rest frame of the Cosmic Microwave Background (CMB) as in 



Peebles (1993) , and we also correct Zwicky magnitudes for interstellar extinction as in |Burstein fc 



Heiles (1978) . The final volume-limited sample PPS79 contains 817 galaxies. 



To find groups, we use the redshift space friends-of-friends algorithm of Huchra & Geller 



1982) , suitably adapted to our case. It is a truncated percolation algorithm with two independent 
linking parameters D and V. Briefly, two galaxies are "friends" if their transverse and radial 
separations r^j and r\ '• satisfy rjj < D and r\ '• < V/Hq, respectively. Friendship is transitive, and 
a set of three or more friends is called a loose group of galaxies. 

Usually, loose groups are identified in magnitude-limited samples. Here, we consider only 
volume-limited samples. Values of D = 0.52/i _1 Mpc and V = 600 km/s give very good agreement 
of global properties (e.g. the total fraction of galaxies in groups, the ratio of groups to galaxies, or 
the median velocity dispersion) between our volume-limited group catalogue and the magnitude- 
limited catalogue constructed by |Trasarti-Battistoni (1998) . 



The final sample contains 230 galaxies in 48 loose groups. A typical group has 5 observed 
members, a "virial mass" of some 1O 13 M0, and an observed luminosity of some 10 10 Lq. Both 
its radius and its inter-member pairwise separation are around 0.5/i _1 Mpc, and the line-of-sight 
velocity dispersion amounts to roughly 200 km/s, so the groups appear thin and elongated in 
redshift space. 



3.2. J(r) for the galaxy samples 

We calculated J(r) for all galaxies from the PPS79 sample; the results are shown in Figure |3|. 
With J(r) lying outside the area occupied by realizations of a Poisson process, one can clearly see 
that galaxies are strongly clustered - not a particularly surprising result. In Sect. [D| somewhat 
more interesting comparisons with galaxy mock-samples extracted from iV-body simulations are 
performed. 

Figure ^| displays the results for grouped galaxies. Since each group contains at least three 
members, the nearest neighbour of a grouped galaxy is certainly found within the largest link length 
used in the friends-of-friends procedure. Hence we observe G(r) = 1 and subsequently J(r) = for 
r > 5.6/i _1 Mpc in the grouped galaxy sample. J(r) is in general not invariant under changes of the 
number density ( van Licshout fc Baddclcy 1996| ). To compare the J(r) for grouped galaxies with 
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Fig. 2. — In this panel we show Jm for Matern cluster processes with fixed cluster radius 
R = 1.5/t -1 Mpc and varying mean number of galaxies per cluster fj, = 1,3, 10,30 (bending down 
successively). The areas indicate \—a fluctuations of 50 realizations. 
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Fig. 3. — J{r) for all galaxies in the PPS79 sample (solid line) and for 50 realizations of a Poisson 
process with the same number of points (dashed area). The dashed area indicates 1-<t fluctuations. 




Fig. 4.— J(r) for the galaxies in groups in the PPS79 sample (solid line), for a Matern cluster 
process with /j, = 5 and R = 1.5/i -1 Mpc (dashed line), for the average of 50 samples extracted from 
all galaxies with the same number density as the galaxies in groups (light shaded area), and for the 
average and fluctuations of 50 realizations of a Poisson process (dark shaded area) . 



the J(r) for all galaxies, we subsample the denser PPS79. J(r) is calculated from 50 subsamples of 
230 galaxies randomly selected from the whole PPS79 sample. With J{r) we measure the strength 
of clustering, which is emphasised when we consider galaxies in groups only, and is less pronounced 
when we look at the whole sample with field galaxies included. Similarly the value of J(r) for 
the sub-sampled PPS79 is higher than J{r) for the whole PPS79, because Random sub-sampling 
(thinning) tends to increase J{r) towards the Poisson value. 



The centers of loose groups show a strong correlation themselves ( [Trasarti-Battistoni et aL 



1997 ), therefore a Matern cluster process can only serve as a rough approximation to the true 
distribution of galaxies in groups. Despite this, a Matern cluster process with \i = 5 galaxies per 
group (cluster) and a group radius of R = 1.5ft, _1 Mpc shows a J(r) comparable to the J{r) obtained 
from the galaxies in groups, where in the mean 4.8 galaxies reside in a group (see Fig. |2|). We see 
a low, almost constant value of J(r) for r > 2.5/i _1 Mpc. This suggests that we are indeed looking 
at highly clustered galaxies with small contamination by "field" galaxies. 

The Jm(v) of a Matern cluster process gets constant for radii twice as large as the cluster 



radius. Already, van Lieshout fe Baddeley (1996) express their hope to deduce a cluster scale R in 



a point distribution from J{2R) ~ const. However, this must be taken with extreme caution. As 
can be seen from Fig. |2| we may be fooled by a factor of three by the fluctuations in the estimated 
J(r). The uncertainty becomes even worse when we consider certain Cox-processes, where J{r) 
decreases strictly monotonically towards a constant value ( van Lieshout fc Baddeley 1996| ), and 



in principle no scale can be deduced from the comparison with the oversimplified Matern cluster 
process. Either we have to restrict ourselves to qualitative statements, or come up with more refined 
and realistic models. 



4. Comparison with N body simulations 

The preceding section showed that the qualitative features of the galaxy distribution are well 
described by the J-function. In this section we explicate that the J-function is also suitable for a 
quantitative comparison, and allows us to constrain cosmological models. 



4.1. N body simulations and mock catalogues 

We extract 64 mock-PPS catalogues from a cosmological N-body simulation of a Mixed Dark 
Matter (MDM) model. 

We consider a MDM model with one species of massive neutrinos, dimensionless Hubble param- 
eter h = 0.5 and density parameters Vl c = 0.8, Qh = 0.2 for cold and hot dark matter, respectively. 
The analytical expressions for the MDM power spectra P(k) was taken from Ma (1996)| . The initial 



P(k) was normalised to the COBE 4-yr data (Bunn & White 1997), giving a corresponding value 



of a"8 = 0.82 for the r.m.s. mass fluctuation in an 8h x Mpc sphere. 

The simulation was run from an initial expansion factor a» = 1 down to aj = 4.5 using a P 3 M 
code with 100 3 particles of mass 1.49 • 1O 13 M0, on a cubic grid of 256 3 cells, with a force softening 
radius 0.32/i _1 Mpc, in a box of side 300/i _1 Mpc. The integration was performed in comoving 
coordinates using a(t) as time variable for a total of 225 steps. 



We identify "galaxies" in our simulation with a method similar to the one discussed by Little 



fc Weinberg (1994) : 



First, we associate with each particle a number m of galaxy-scale peaks calculated from the 
initial density contrast field <5(x). In the peak-background split approximation ( |Bardeen et al 



1986 , White et al. 1987 , Park 1991 ) rii is the number of galaxy peaks with height 5 s (xj) > vth^si 
where 5 s (x) denotes the field smoothed with a Gaussian kernel of width R s = 0.55/t Mpc, and a 2 s 
gives the smoothed field's variance. The field is subject to the constraint that it takes the value 



Vb&b when smoothed on a scale Rb > R s (see Park 1991 for more details). Choosing vth = 0.05, 



at a = 4.5 the particle two-point correlation function, weighted according to nt, matches in slope 
and amplitude the galaxy two-point correlation function. For the adopted parameters, the total 
number of peaks in the box is J2 n, — 690, 000. 

Then, we select the i— th particle as a galaxy if Arii > p, where p £ (0, 1) is a uniformly 
distributed random variable, and A is a constant of proportionality. The latter is set by the 
requirement that the mean number density of "galaxies" in the box matches the mean density of 
M < — 19 + 51og(/t) galaxies expected from the Schechter luminosity function with a = —1.15, 



M* = -19.3 + 51og(/i), 0* = 0.02/i 3 Mpc~ 3 appropriate for PPS ( [rrasarti-Battistoni 1998J ; pVlarzke 



et al. 19941 ). This Monte-Carlo procedure makes the implicit assumption that the higher the peak, 



the more luminous the associated galaxy. 

The mock-PPS catalogues were built as follows. The simulation cube was divided into 64 sub- 
cubes of side length 75/i~ 1 Mpc. Within each sub-cube we fit a PPS-like wedge of radius 79/i~ 1 Mpc. 
Redshift-space coordinates a, 5, cz were assigned to all the "galaxies" of the sub-cubes. Finally, 
we kept only the "galaxies" within the redshift-space-boundaries of the mock-PPS catalogues. 

Although we are looking at a large volume with a depth 79/i _1 Mpc and a solid angle of 0.76sr, 
we observe large fluctuations of 25% in the number of points per mock-sample (Fig. S). This is 
consistent with the large-scale fluctuations of the clustering properties of IRAS galaxies, as found 
by |Kerscher et al. (1998) out to scales of 200/i~ 1 Mpc, and expresses cosmic variance in agreement 



with expected sample-to-sample variations ( Buchert fc Martinez 1993 ). As we will see, this slightly 
complicates the analysis. 
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Fig. 5.— This histogram displays the number of points per mock-sample; the solid lines give the 
values in redshift-space, while the dashed lines corresponds to selection in real-space. 
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Fig. 6. — The mean J{r) and the la range for the mock-samples with A = 30 (solid line, dark 
shaded), with A = 100 (dotted line, medium shaded), and for all mock-samples (dashed line, light 
shaded, the la range is plotted symmetrically). 
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4.2. J(r) for the mock samples 

At first we investigate the mock-samples selected in redshift-space. If we use all the 64 mock- 
samples we are dominated by the fluctuations between samples with a different number density (see 
Fig. |6|). Therefore, we restrict ourselves to mock-samples with approximately the same number of 
points as in the observed galaxy sample: iV gal — A < iV moc k < N gSb \ + A, with iV ga j = 817. For 
A = 30 only six samples enter, whereas for A = 100 we already have seventeen mock-samples to 
analyse. The mean value of J(r) hardly changes between samples with different A. Obviously, 
samples with low density tend to be centred on voids, and high-density samples typically include 
large, Coma-like clusters. So large fluctuations in the number density lead to large fluctuations in 
the clustering properties measured by J(r) but cancel in the mean. These fluctuations decrease 
for smaller A (see Fig. |(| this was confirmed by inspecting samples with A = 50 and A = 200). 
In order to look at structures comparable to the PPS sample we consider mock-samples with a 
similar number density as in the observed galaxy sample, and do not subsample the mock-samples 
with high number density. 

In Fig. the results of the mock-samples in real- and redshift-space are compared. The 
mock-samples selected in redshift-space show a weaker clustering than mock-samples selected in 
real-space on small scales out to at least 2/i _1 Mpc, as can be deduced from the higher J(r). This 
can be traced back to redshift space distortions. The peculiar motions act by erasing small scale 
clustering; therefore the J value of redshift-space samples is larger (less clustering) than that of 
real-space samples. This effect changes at a given distance (2/i _1 Mpc). The same effect was found 



by Martinez et al. (1993) in volume-limited subsamples, extracted from CfA-I, by means of the 



two-point correlation function. 



4.3. Comparison of the PPS galaxies with the mock samples 

In Fig. H the results of the mock-samples in redshift-space are compared with the results of 
the observed galaxy distribution in the PPS. The mock-samples show insufficient clustering on 
small scales out to at least 3/i _1 Mpc, as can be deduced from the higher J(r). This is probably 
due to the high velocity dispersion in MDM models ( Jing et al. 1994|) . In real-space, which 



is not directly comparable with the PPS data, the mock-samples reproduce the clustering on 
small scales out to l/i _1 Mpc, but again show not enough clustering, even though they become 
marginally consistent with the observed galaxy distribution on larger scales. We have to conclude 
that this MDM simulation is unable to reproduce the observed strong clustering of galaxies on small 
scales. Of course this result depends on our method of galaxy identification. A different biasing 
prescription might change this. On large scales a definitive answer is not possible, since for r larger 
than 6/i -1 Mpc an estimation of J(r) becomes unreliable; the empirical G(r) and F(r) approach 
unity, and the quotient J(r) is ill-defined. 
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Fig. 7. — This plot shows the average value of J(r) and la range for the mock-samples with 
A = 100 in real-space (solid lines), and for the mock-samples in redshift-space (dotted lines). 
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Fig. 8. — J(r) is shown for the PPS79 galaxy sample (solid line) and the \a range for the mock- 
samples with A = 100 in redshift-space (dashed area). 
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5. Conclusion and Outlook 

We have highlighted promising properties of the global morphological descriptor J(r). It 
connects the distribution functions F(r) and G(r) and, hence, incorporates all orders of correlation 
functions. J(r) measures the strength of clustering in a point process and distinguishes between 
correlated and anti-correlated patterns. The example of a Matern cluster process illustrates that 
J(r) sensitively depends on the richness of the clusters or groups. 

Since J{r) is built from cumulated distribution functions, we do not encounter spurious results 
due to binning. This becomes particularly important on small scales. 

The application of the J-function to galaxies in a volume limited sample and to a sample of 
galaxies in loose groups clearly showed the stronger clustering of galaxies in groups. In a comparison 
with a Matern cluster we found that internal properties, like the richness of loose groups, are 
satisfactorily modelled. However, for the large-scale distribution of galaxies, the Matern cluster 
process clearly is an over-simplification. 

We used the J-function for a comparison of the observed galaxy distribution with galaxy mock- 
samples. Although the mock-samples extracted from a MDM-simulation cover a large volume, we 
detected large fluctuations of the order of 25% in the number of points per sample. On small scales, 
out to l/i _1 Mpc, the clustering in real-space is as strong as in the observed galaxy distribution, 
but the comparable redshift-space mock-samples show too weak clustering. On larger scales from 
2-6/i -1 Mpc both real- and redshift-space mock-samples show too weak clustering. Hence, this 
MDM simulation is not able to reproduce the observed strong clustering of the galaxies on small 
scales. 

The function J(r) has proved to achieve comparable discriminative power as the Minkowski 
functionals ( |Kerscher et al. 1998 ), and is most suitable for addressing the question of "regularity" 



on large-scales as demonstrated in an analysis of the distribution of superclusters (Kerscher 1998). 
In this article we have shown that the J(r) function is a useful tool for quantifying the clustering of 
galaxies on small scales and is capable of constraining cosmological models of structure formation. 
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